Initial Installation¶

Packages you'll need installed:

  • opencv / cv2 : pip install opencv-python
  • ffmpeg (for converting videos)
In [10]:
import pandas as pd
import numpy as np
import cv2
import matplotlib.pyplot as plt

from glob import glob

import IPython.display as ipd
from tqdm import tqdm

import subprocess

plt.style.use('ggplot')
In [11]:
pwd
Out[11]:
'C:\\Users\\H263429\\OneDrive - Halliburton\\Desktop\\Git\\Computer_vision\\Image_tracking-Video-'

Converting video types¶

Use ffmpeg to convert mov to mp4.

In [13]:
#This piece of code will not work in vscode (Don't know the reason but it is unable to find the file movie.mov)

input_file = 'movie.mov'
subprocess.run(['ffmpeg',
                '-i',
                input_file,
                '-qscale',
                '0',
                'movie_mp4_converted.mp4',
                '-loglevel',
                'quiet']
              )
Out[13]:
CompletedProcess(args=['ffmpeg', '-i', 'movie.mov', '-qscale', '0', 'movie_mp4_converted.mp4', '-loglevel', 'quiet'], returncode=0)
In [21]:
#This command is to list directories in a folder, But unfortunatley ls command is not working for me.

!ls -GFlash --color

#Explanation:
#     -GFlash: These are options or flags that modify the behavior of the ls command. Each letter represents a different option:

#         -G: Enables colorized output. The color helps to distinguish between different types of files (e.g., directories, executables, etc.).
#         -F: Appends a character to the end of each listed entry, indicating its type. For example, a forward slash (/) denotes a directory.
#         -l: Displays detailed information about each file or directory, including permissions, ownership, size, and modification date.
#         -a: Shows all files, including hidden files and directories (those starting with a dot).
#     --color: This option enables colorized output for the ls command.
'ls' is not recognized as an internal or external command,
operable program or batch file.

Display Video in Notebook¶

In [24]:
ipd.Video('movie_mp4_converted.mp4', width=700)
Out[24]:
Your browser does not support the video element.

Open the Video and Read Metadata¶

In [25]:
# Load in video capture
cap = cv2.VideoCapture('movie_mp4_converted.mp4')
In [27]:
# Total number of frames in video
cap.get(cv2.CAP_PROP_FRAME_COUNT)
Out[27]:
2398.0
In [28]:
# Video height and width
height = cap.get(cv2.CAP_PROP_FRAME_HEIGHT)
width = cap.get(cv2.CAP_PROP_FRAME_WIDTH)
print(f'Height {height}, Width {width}')

# So we can see that this is standard 720p HD (1280*720)
Height 720.0, Width 1280.0
In [29]:
# Get frames per second
fps = cap.get(cv2.CAP_PROP_FPS)
print(f'FPS : {fps:0.2f}')
FPS : 59.94
In [31]:
cap.release()

# The cap.release() method is called after video capture to release the resources associated with the video capture object. 
# When you open a video file or start capturing from a camera using cv2.VideoCapture(), system resources such as file descriptors
# or camera streams are allocated to handle the video capture.

# Calling cap.release() explicitly ensures that these resources are released properly when you are done with the video capture. 
# This is especially important if you are working with multiple video capture instances or if you want to free up system 
# resources for other processes.

Pulling in Images from Video¶

In [32]:
cap = cv2.VideoCapture('movie_mp4_converted.mp4')
ret, img = cap.read()
print(f'Returned {ret} and img of shape {img.shape}')
Returned True and img of shape (720, 1280, 3)
In [37]:
## Helper function for plotting opencv images in notebook, if we'll use simply matplotlib for this then you'll see image with unproper colouring
def display_cv2_img(img, figsize=(10, 10)):
    img_ = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)   # This line converts the image from the BGR color space (which OpenCV uses by default) to the RGB color space. This conversion is necessary because matplotlib, which is used later to display the image, expects images in RGB format.
    fig, ax = plt.subplots(figsize=figsize) # The fig variable refers to the entire figure, while the ax variable refers to the individual subplot.
    ax.imshow(img_) 
    ax.axis("off")   # This line turns off the axis labels and ticks for the subplot. By setting it to "off", the x and y axis labels and ticks will not be displayed.
In [38]:
display_cv2_img(img)
In [39]:
cap.release()

Display multiple frames from the video¶

In [42]:
fig, axs = plt.subplots(5, 5, figsize=(30, 20))
axs = axs.flatten()  # Used to flatten a multi-dimensional array or list into a 1-dimensional array. In matplotlib, when you create subplots using plt.subplots(), it returns a NumPy array or a list of axes objects. Each axis object represents an individual subplot. If you have a multi-dimensional array of axes objects, such as a 2D array, and you want to access each subplot individually or iterate over them, you can flatten the array using the flatten() method.

cap = cv2.VideoCapture("movie_mp4_converted.mp4")
n_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

img_idx = 0
for frame in range(n_frames):
    ret, img = cap.read()
    if ret == False:
        break
    if frame % 100 == 0:
        axs[img_idx].imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
        axs[img_idx].set_title(f'Frame: {frame}')
        axs[img_idx].axis('off')
        img_idx += 1

plt.tight_layout()
plt.show()
cap.release()

Add Annotations to Video Images¶

In [76]:
labels = pd.read_csv('mot_labels.csv', low_memory=False)   #By default, pandas infers the data types of the columns in the DataFrame based on a sample of the data. However, when dealing with large CSV files, where the memory usage could be significant, pandas may display a warning message regarding the data type inference process. This warning message is displayed because pandas initially reads only a small portion of the data to determine the data types, which may not always be accurate. To suppress this warning and prevent pandas from inferring the data types, the low_memory=False parameter is used. This instructs pandas to allocate sufficient memory upfront to load the entire CSV file into memory for data type inference
video_labels = ( labels.query('videoName == "026c7465-309f6d33"').reset_index(drop=True).copy() )
video_labels["video_frame"] = (video_labels["frameIndex"] * 11.9).round().astype("int")  #Video labels and video are not in the same frames, while the video is at 50fps but video lablers are only at 5 fps (201/40) so we will multiply this with the difference in frame rates. So we have multiplied last frameindex(i.e. 201) with 11.9 so as to make it 2392, which is closer to Frame count (i.e. 2398)
In [16]:
video_labels["category"].value_counts()
Out[16]:
car              3030
pedestrian        847
bicycle           381
rider             320
truck             194
other vehicle     115
bus               109
other person       74
motorcycle         67
trailer            34
Name: category, dtype: int64
In [77]:
# Pull frame 1035 (frame 1035 has maximum number of boxes)

cap = cv2.VideoCapture("movie_mp4_converted.mp4")
n_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

img_idx = 0
for frame in range(n_frames):
    ret, img = cap.read()
    if ret == False:
        break
    if frame == 1035:
        break
cap.release()
In [78]:
display_cv2_img(img)
In [85]:
img_example = img.copy()
frame_labels = video_labels.query('video_frame == 1035')
'''
for i, d in frame_labels.iterrows(): This is a for loop statement that iterates over each row of the DataFrame.

i represents the index of the current row being iterated.
d represents the row data, typically as a Series or a dictionary-like object.
'''


for i, d in frame_labels.iterrows():
    pt1 = int(d['box2d.x1']), int(d['box2d.y1'])
    pt2 = int(d['box2d.x2']), int(d['box2d.y2'])
    cv2.rectangle(img_example, pt1, pt2, (0, 0, 255), 3)   # (0,0,255) represents colour BGR (Since Blue and Green are zero hence the box is red) and 3 is the thickness (3 represents here 3 pixel thickkness)

display_cv2_img(img_example)

Displaying Colored by Category¶

In [86]:
color_map = {
    "car": (0, 0, 255),
    "truck": (0, 0, 100),
    "pedestrian": (255, 0, 0),
    "other vehicle": (0, 0, 150),
    "rider": (200, 0, 0),
    "bicycle": (0, 255, 0),
    "other person": (200, 0, 0),
    "trailer": (0, 150, 150),
    "motorcycle": (0, 150, 0),
    "bus": (0, 0, 100),
}

img_example = img.copy()
frame_labels = video_labels.query('video_frame == 1035')
for i, d in frame_labels.iterrows():
    pt1 = int(d['box2d.x1']), int(d['box2d.y1'])
    pt2 = int(d['box2d.x2']), int(d['box2d.y2'])
    color = color_map[d['category']]
    cv2.rectangle(img_example, pt1, pt2, color, 3)

display_cv2_img(img_example)

Adding Text¶

In [98]:
frame_labels = video_labels.query("video_frame == @frame")   # Earlier we have set frame as 1035
font = cv2.FONT_HERSHEY_TRIPLEX
img_example = img.copy()
for i, d in frame_labels.iterrows():
    pt1 = int(d["box2d.x1"]), int(d["box2d.y1"])
    pt2 = int(d["box2d.x2"]), int(d["box2d.y2"])
    color = color_map[d["category"]]
    img_example = cv2.rectangle(img_example, pt1, pt2, color, 3)
    pt_text = int(d["box2d.x1"]) + 5, int(d["box2d.y1"] + 10)
    img_example = cv2.putText(img_example, d["category"], pt_text, font, 0.5, color)
display_cv2_img(img_example)
cap.release()

Label and output Annotated Video¶

In [106]:
def add_annotations(img, frame, video_labels):
    max_frame = video_labels.query("video_frame <= @frame")["video_frame"].max()  # It's imply max_frame = frame (i.e. 1035), I don't know why they havn't used it directly
    frame_labels = video_labels.query("video_frame == @max_frame")
    for i, d in frame_labels.iterrows():
        pt1 = int(d["box2d.x1"]), int(d["box2d.y1"])
        pt2 = int(d["box2d.x2"]), int(d["box2d.y2"])
        color = color_map[d["category"]]
        img = cv2.rectangle(img, pt1, pt2, color, 3)
    return img
In [107]:
VIDEO_CODEC = "mp4v"
fps = 59.94
width = 1280
height = 720
out = cv2.VideoWriter("out_test.mp4",
                cv2.VideoWriter_fourcc(*VIDEO_CODEC),
                fps,
                (width, height))

cap = cv2.VideoCapture("movie_mp4_converted.mp4")
n_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

for frame in tqdm(range(n_frames), total=n_frames):     # tqdm shows the progress bar
    ret, img = cap.read()
    if ret == False:
        break
    img = add_annotations(img, frame, video_labels)
    out.write(img)
out.release()
cap.release()
100%|██████████████████████████████████████████████████████████████████████████████| 2398/2398 [00:33<00:00, 71.15it/s]

Convert our labeled output to mp4 and view¶

In [108]:
tmp_output_path = "out_test.mp4"
output_path = "out_test_compressed.mp4"
subprocess.run(
    [
        "ffmpeg",
        "-i",
        tmp_output_path,
        "-crf",
        "18",
        "-preset",
        "veryfast",
        "-vcodec",
        "libx264",
        output_path,
        '-loglevel',
        'quiet'
    ]
)
Out[108]:
CompletedProcess(args=['ffmpeg', '-i', 'out_test.mp4', '-crf', '18', '-preset', 'veryfast', '-vcodec', 'libx264', 'out_test_compressed.mp4', '-loglevel', 'quiet'], returncode=0)
In [109]:
ipd.Video('out_test_compressed.mp4', width=600)
Out[109]:
Your browser does not support the video element.